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Abstract: The number of points x — (x\,X2, ■■■x n ) that lie in an integer cube C 
in R n and satisfy the constraints J2j ^ij( x j) — Si,l < i < d is approximated by 
an Edgeworth- corrected Gaussian formula based on the maximum entropy density 
p on x e C, that satisfies hij(xj) = Sj, 1 < i < d. Under p, the variables 

Xi, X2, ...X n are independent with densities of exponential form. Letting Si denote 
the random variable J^, hij(Xj), conditional on S — s,X is uniformly distributed 
over the integers in C that satisfy S = s. The number of points in C satisfying 
S = s is p{S = s}exp(/(p)) where I(p) is the entropy of the density p. We estimate 
p{S = s} by pz(s), the density at s of the multivariate Gaussian Z with the same 
first two moments as S ; and when d is large we use in addition an Edgeworth factor 
that requires the first four moments of S under p. The asymptotic validity of the 
Edgeworth- corrected estimate is proved and demonstrated for counting contingency 
tables with given row and column sums as the number of rows and columns ap- 
proaches infinity, and demonstrated for counting the number of graphs with a given 
degree sequence, as the number of vertices approaches infinity. 
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1 Maximum entropy estimation of the number of integer points 

Let x = {x\, X2, ■■■x n ) be a vector in R n . For arbitrary R 1 — > R 1 functions htj 
define Si — J2j ^ij{ x j)i 1 < i < d. Let Q be counting measure on a cube C of inte- 
gers in R n . Consider the surface S = s in R n consisting of points x that satisfy the 
sums Si = hij(xj) = Si, 1 < i < d. The volume of the surface Q{S — s} > is 
the number of points that lie in C and in the surface {S = s}. For Pjj the uniform 
distribution on the cube C, Q{S = s} = Pu{S = s}Q{C}. 

Let X = (Xi, X2, ■■■X n ) be n random variables uniformly distributed over the 
cube. Since the random variables are independent, the central limit theorem will 
apply to the sums Si = 5^ . hij(Xj) under suitable conditions on the h. Thus we 
might approximate the probability Pjj{S — s} by pz(s), the density at s of a mul- 
tivariate Gaussian Z with the same first and second moments as S. We expect this 
approximation to work well when the mean of S is close to the selected values s, 
but not so well in the tails of the distribution. Therefore we propose maximum 
entropy Gaussian estimation of the volume using an approximating Gaussian with 
mean value s. This procedure is called exponential tilting; see, for example, [KT03]. 

The entropy of a discrete random variable X having density p (with respect to 
counting measure) is: 

(1) I(p) - -E{\ogp(X)}. 

We find the maximum entropy distribution P described in [J57] , with density p on 
a cube C of integers in R n satisfying ES = s. If there is a density of exponential 
form 

(2) P{X = x}=p(x)=exp{y2..\ih ij (X j ) + \o} 

where the are chosen to satisfy the expectations ES — s, and to ensure that 
J2xecP( x ) = 1' tncn tms density may be shown to be the unique maximum en- 
tropy density subject to the constraints ES = s. 

Under P, the variables X\,X2, ...X n arc independent with densities 

(3) pj(xj) = cxp{J^ Xihijixj) + vj}. 

And, conditional on S — s, X is uniformly distributed over the integers x in C that 
satisfy S — s, with 

(4) p(x) = cxp{^2 x i s i + M = exp{-I(p)}, 

i 

since 

(5) i{p) = Y,i-p( x ) !°gp(z)] = - E izZij x m x j) + m = - x ^ - A ° 

x i 

Thus, for any x that satisfies S — s, 

(6) Q{S = s} = P{S = s}/p(x) = P{S = s} cxp{/(p)}. 

The entropy term in this formula was suggested in some special cases in [B09]. 

We again estimate P{S = s} by pz{s), the density at s of a multivariate Gaussian 
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Z with the mean and covariance of S. The advantage in using the maximum en- 
tropy P is that the mean of the Gaussian is s, so that "debiased" estimation takes 
place at the mean. 

If the h functions are just multiples, say hij(Xj) — AijXj, then the maximum 
entropy density p consists of independent exponential form densities 

(7) pj(x) = exp{6jX - c(6j)} 

on the Xj with canonical parameters 8j — ^ i XiAij and expectations c'(9j). The 
parameters Xi are chosen so that^V Aijc'(6j) = Sj. 

Because p is maximum entropy, the 8j may also be characterized [Ba09] as the 
unique maxima of the p-entropy J2j[c(9j) — Gjc'iOj)] for a given 9 subject to 
J2j Aijc'(0j) = Si. And then 

(8) Q{S = s} = P{S = «}/ JJexp{fl J -c'(^) - c(6j)} = P{S = S }cxp{/(p)}. 

3 

So far, we have followed the approach in [BHIOa] of maximum entropy gaussian 
approximation. However, when the number d of sums Si approaches infinity, and 
the variances of the sums are 0(d), the relative error in Gaussian approximation 
to the true density for the i th sum will be typically P{Si — s»}/pz 4 (sj) — 1 = 
0(1/ d) and the error in approximating the true density for d sums will be about 
(1 + 0(l/d)) d — 1 = 0(1). In order to get an accurate approximation we need to 
consider the Edgeworth corrections to the Gaussian approximation, which use the 
third and fourth cumulants of the S distribution. 

In [MW90] , McKay and Wormald produced an asymptotic formula for the num- 
ber of near regular graphs on n vertices with k edges, where k is proportional to 
n. They derive the formula by a saddlepoint approximation to Cauchy's integral 
for determining a coefficient in a generating function. Their generating function 
turns out to be the characteristic function of the sums S appropriate for this prob- 
lem. The maximum entropy Edgeworth approximation generalises their formula to 
graphs with widely varying degree sequences in [BHIOb]. The maximum entropy 
method can also be used to estimate the number of graphs with given degree se- 
quences and with additional edge specifications such as specified cliques or colorings 
of the graph. 

In [CM05], [GMW06] , [CM07], [CGM08], [MG] ,Canfield, Greenhill, McKay, 
Wormald, and Wang extended the Cauchy integral approach to asymptotic enu- 
meration of two way contingency tables of integers in which the marginal sums are 
known, with the row sums nearly equal and the column sums nearly equal. The 
integers may be non-negative, or constrained to be 0-1. The maximum entropy 
Edgeworth approximation, (see also [BH09]), generalises their formulae to the case 
of varying marginal sums. The formulae require the first four moments of certain 
sums of independent random variables. The maximum entropy table entries are 
independent geometric variables when the integers in the tables are non-negative, 
and independent Bernoulli variables when the integers are 0-1. 

The advance in the maximum entropy Edgeworth approximation is that it pro- 
vides a unified method for the problems mentioned above, and for generalisations 
of them, using a standard statistical approximation^ see for example [K06]), based 
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on the first four moments of sums of independent variables determined by the max- 
imum entropy distributions. 

Diaconis and Efron [DE85] study the distribution of a chi-square statistic for the 
uniform distribution over contingency tables with fixed margins. The number of 
rows and columns are fixed, but the total count approaches infinity. If instead the 
table entries are bounded, but the numbers of rows and columns approach infin- 
ity, we expect that a maximum entropy approach should yield a valid asymptotic 
estimate of the distribution. Here the maximum entropy table entries are integer 
Gaussians: Gaussian variables, with arbitrary means and variances, constrained to 
be integers. 
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2 The Edgeworth approximation for integer random variables of increas- 
ing dimensionality. 

Let Xd be a sequence of d— dimensional integer random variables having mean 
0. Suppose that the determinant of the lattice generated by values of Xd having 
positive probability is Ad- We wish to estimate the probability P{Xd = 0} using 
the first four moments of X d . 

Define Q a (t) = 1 if max^ |tj| < a, Q a (t) = if max \ti\ > a. 

We use the d-dimensioned characteristic function <pd(t) = E(it'Xd), with t a column 
vector in R d , and t' the corresponding row vector: 

(9) P{X d = 0} = (2n)- d J QMt). 

The cumulant term K d (t) is the polynomial term of degree r in the expansion 
log^(t) = ZZ i 7i K d(t)- Specifically, 

(10) Kj(t) = E(t'X d f, Kj(t) = E(t'X d f, Kj(t) = E(t'X d f 3(Kj(t)) 2 . 
The variance-covariance matrix V d , is determined by the second cumulant: 

(11) £..^(i,j) = ^ 2 (*). 

Define n d = E d {K d (t) 2 } , n d = E d K\(t) where the expectation E d is with respect 
to t ~ ^V(0, V d ~ ), a Gaussian variable with mean and variance-covariance V d . 
The Edgeworth approximation to P{Xd = 0} is 

(12) P{X d = 0} = A d (27T)- d / 2 \V d \- 1/2 exp(-«2/72 + 4/24). 

The approximation consists of the density at zero of a Gaussian with variance- 
covariance Vd, multiplied by an Edgeworth term correcting for the departure from 
Gaussianity. 

We will use the order of magnitude notation 

(13) f(d) = o(g(d)) : f(d)/g(d) -> as d -> oo, 

(14) f(d) = 0(g(d)) : limsup|M| <TO . 

d g{a) 

Theorem 1 Let Ed denote expectation with respect to t <~ 7V(0, V^ 1 ). Suppose 
that for some M, e — My/log d/d, 

(i) 4 = o(i), 4 = o(i), 

(II) ^|g £ exp[l^]|=0(l), 

(in) Qepog^(t)-E* =2 ^(*)|r ] = o(1) ' 

(IV) £ d |g £ cxp[-^^ 3 (t) + 1«2 + - 1 as d ^ ^, 

(V) / / |0„(t)| - 0(1). 

JQ-x-Q, JQe 



6 



(13) Then P{X d = 0}/P{X d = 0} 1 as d -S- oo. 

Comments on conditions: 

The theorem doesn't prove too much itself, but rather outlines a program for prov- 
ing the validity of the approximation in particular cases. 

Conditions I, II bound the third and fourth cumulants. Condition III, IV require that 
the third and fourth cumulants affect the characteristic function integral through 
the summary cumulants K d ,K d . Condition V requires that contributions to the 
characteristic function integral be negligible outside a small cube centered at 0. In 
particular this causes the determinant of the lattice of possible values of X d to be 
1 for d large enough. 

Proof: Let K 3i {t) = -±iK%(t) + ±4 + ±Kj(t) - 
From 1,11, 

E d {Q £ \ex P K 34 (t)\ 2 } < eMm^ + T2^ d (t)-^4]=0(l), 
E d {Q £ exp[K 34 (t)+o(l)}} - E d {Q £ exp[K 34 (t)}} 
(i4j = o(l)E d {Q £ cxp[K 34 (t)}} 

= o(l) (E d {Q £ |cxp^ 34 (t)| 2 }) 1/2 = o(l), 

From III,IV 
(15) 

A d (2n)- d J Q £ (t)Mt)/P(X d = 0} = E d {Q £ exp[±^ 2 (t) + log t/>(t) + ±n\ - 
= E d {Q £ exp[K 34 {t) + o(l)]} = E d {Q £ exp[K 34 (t)]} + o(l) -^las^M. 

Thus 

(16) J g e ^/{(27r) d / 2 |y d r 1 /2 CX p h JL K 3 + _^ K 4 ] |^ L 

A similar argument shows that, since | exp[K d (t)i 3 /3!]| = 1, 

(17) J Q £ \M/{(^r d/2 \v d \ 1/2 cM^4]} i. 

This shows that jQ s \<f> d \ = 0(1)|/ Q e <f> d \. 
Thus from condition V , 



(18) J QAdl J 



Qe<t>d -+ 1. 



We now show that condition V requires the determinant of the lattice to be 
1 for d large enough. In the contrary case, consider the reciprocal lattice in d di- 
mensions consisting of all vectors a for which a'X d is integer with probability one. 
The determinant of this lattice is the reciprocal of the determinant of the original 
lattice, and so the reciprocal determinant is less than or equal to 1 /2- There must 
be a non-zero point in the reciprocal lattice which lies in the half-unit cube; thus 
there is a non-zero point t = 2ira lying in the cube Q v (t) — 1 for which a'X d is 
integer. Now (f> d (t + u) = E{cxp(i(t + u)'X d )} = E{cxp(iu'X d } = (f> d (u), since 
cxp(2ira'X d ) = 1. Thus the integral |<£d(i)| in the neighbourhood of t — 2ira equals 
its integral in the neighbourhood of 0, which contradicts V. 



Since Ad = 1, combining (15) and (18) gives 

(19) P{X d = 0}/P{X d = 0} = (1 + o(l)) J Q^dl J Qs4>d ^las^ M , 
which concludes the proof. 
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3 Numbers of contingency tables with given row and column sums 

Consider a contingency table of non-negative integers , 1 < i < m, I < j < n 
with row and column sums Ri = J2j Xij,Cj = J2i ^ij- We wish to estimate the 
number of tables satisfying the constraints Ri = r^, Cj = Cj . Define the dimension 
d = (m + n — 1) integer vector Sd ■ 

( 20 ) Sjd = Rj-rj,l<j <m, 
S( k +m)d = C k -c k ,l<k<n-l. 

Following the program of section 2, the Edgeworth approximation begins with the 
maximum entropy distribution for {Xjk} with expectations ERj — Tj,ECk = c k , 
which consists of independent geometries with expectations \Xj k : 

(21) p{x jk = X } = (-^-r/(i+ Nk ), 

1 + H]k 

where log(l + 1/ fij k ) = aj + {3 k and parameters aj,(3 k are chosen for which 

(22) ERj = J2 k = r s , EC 3 = J2 j Nk = c k ■ 

The existence of parameters o.j , fi k satisfying the marginal constraints is shown in 
[B09]. The maximum entropy entries fj,j k are uniquely determined, a + c, /3 — c is 
a solution if and only if a, (3 is a solution. 

The conditional distribution of {Xjk} under the constraints {R = r,C = c}, (cquiv- 
alently {Sd — 0}), is uniform. The number of integers satisfying the constraints 
is 

(23) Q(S d = 0) = P{S d = 0}exp(/(P)) = P{S d = 0} J](l + » jk ) 1+ ^^ k . 

jk 

The probabilityPjSd = 0} is approximated by 

(24) P{S d = 0} = (27r)- d ^\V d \- 1/2 cxp(-^/72 + «*/24), 

depending on the first four cumulants of Sd, as explained in section 2. See [BH09,BH10a] 
for further discussion. 

Each element of Sd is the deviation from its mean of a sum of independent geomet- 
ries with expectations {ftjk}- The mean-centered geometric characteristic function 
with expectation fj, is 

(25) WW =e-*7(l-/i(e <t -l)). 

From theorem 1, the validity of the asymptotic estimate may be assessed by the 
limiting behavior of the characteristic function of Sd, with parameters 

tj=Vj,l<j<m, t m+k = w k ,l<k<n-l, w n = 0, 

(26) ^(t) = E{e X p(i[v'(R-r)+w>(C-c)])} = YliP Hk (vj+w k ) 

jk 

We sometimes use t to refer to all the parameters in the characteristic function, 
and at other times use v, w to treat separately the parameters in the characteristic 
function associated with the rows and columns respectively. 

Use x n ~ y n iix n /y n -> 1 and x n « y n if limsup \x n /y n \ < oo,limsup \y„/x n \ < 

oo. 



Theorem 2 
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Suppose that , as d = m + n— 14 oo, m w n w min r, s=s max rj ~ min Cj s=s max c,. 
Assume that 

(1 + _2_)(1 + 

( 27 ) llm mf (1 j mn) > 1 . 

The cumulants K^(t) of t'S d are the sums of the corresponding cumulants of the 
geometries with expectations fj,jk and parameters tjk = Vj + Wk , 

K d = Ejfe^ 2 feMjfc(l +Vjk) 

(28) K\ = £ - fc t%Hjk{l + A*ifc)(l + 2^ fe ) 

= Ejfe + + 6/ij fc (l + /ijfc)). 

Let it^J = i'Vdi. Let i?d denote expectation with respect to t <~ N(0, V^ 1 ). 
Then 

(29) P{S d = Q}/{2n)- d ^\V d \-^ eM~E d {{Klf] + ^E d K d )} -+ 1 

Remark on conditions: Our proof requires that the relative sizes of the maxi- 
mum entropy entries be bounded asymptotically, and that the absolute sizes are 
bounded away from zero and infinity. In [BH09] we prove validity of the Edgeworth 
approximation dropping the condition that the absolute sizes be bounded away 
from infinity. 
Proof: 

We will show conditions I-V of theorem 1 hold. 
Lemma 3.1 max/z^ ss min/i i:) w 1. 

Proof: 

Let fl be the set ofmxn matrices [i satisfying 

fj,ij > 0, 

(30) i <k,j <l fiij > flu, 

(i + i/A*ij)(i + = (l + i/^iXi + Vww)- 

Since the previous equation holds if and only if log(l + = ctj + these 

matrices consist of the maximum entropy geometric expectation matrices corre- 
sponding to the possible non-increasing positive row sums n > r 2 > ...r m > and 
the possible non- increasing positive column sums c\ > c 2 > ...c„ > 0. 
Lemma 3.1.1 

The maximum entry /in achieves its maximum over /j, S Q for given values of 

r i = Efe Mife, ci = X)j Mji. T = Ejfe Wfc when M12 = • • • = Mij ■ = ' ' ' = M21 = 
. . ./iii • • • = \x m \. And the minimum entry \x mn achieves its minimum for given 
values of r m ,c n ,T when fj, ml = . . .\i xj ■ ■ ■ = /x m („_i), Mi„ = . . . fi 2n ■ ■ ■ = M(m-i)n- 

Proof: 

The result is trivial if either n = T/m or ci = T/n ; it will be useful, for uniqueness, 
to forbid these conditions. 

We first prove that the maximum entry /in achieves its maximum over (i£fi 
for given values of n = J2k Mifc> ci = J2 3 Mji, r = Ejfc Mjfe wnen M12 = ■ ■ ■ = Mij = 
••• = Min,/ i 2i = •••Mil''' — Mmi- Equivalently, since by (30), /1 is determined 
by its first row and column, it is equivalent to maximize /in over choices of u = 
{m.7'1,2 < j < m},w = {/iife,2 < k < n}, for given values of n,Ci,T. We need to 
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show that the maximal fi occurs when (u, z) £ S, where all the u's are equal and 
all the z's are equal. 

Consider first the maximization of T over ft £ Q with r\,c\,fiu fixed, which is 
equivalent to maximizing T over choices of (u, z) which are constrained to lie in a 
compact polyhedron so that r\,c\,fiu are fixed. 

Add a further constraint by fixing z, so that the maximization occurs by vary- 
ing only the entries u. From (30), for i > 1, 

(31) (1 + 1/fuj) = (1 + lMi)(l + 1/My)/(1 + V/in) = Aj(l + lMi) 

where Aj = (1 + l/fi\j)/{\ + I//X11) > 1 is fixed given the first row, and by the 
forbidden equality, Xj > 1 for at least one j. For i > 1, it follows that is a 
concave function of /i^i determined by the fixed Xj, and by the forbidden equality 
J2j /J-ij — ff(Mii) where g is strictly concave in fin, and depends only on the fixed Xj. 

Thus T = J2j A*ij + Si>i 5(a*»i) is a strictly concave function of u with a unique 
maximum at u , say. If u a ^ then by strict concavity of g, 

2g(^[u n + > giu^) + g{u% + ui), so the function T may be improved by 

replacing both u n and by ^[u^ + a contradiction. Thus /i 2 i = 

• • • = fin = ■ ■ ■ = fi m i at the maximum. 

Now return to the maximization of T over u, z with n, Ci, /in fixed. The max- 
imum of T, say T(fiu), occurs for some (w, z), and it may be improved, from the 
previous paragraph, unless (u, z) £ S, so these conditions hold at the maximum. 
In addition, the maximizing point (u, z) is unique, given n,Ci,T. Thus, at the 
maximum, 



,„ s T = n +ci - fin + (m - l)(n - l)/z 2 2 

l^^J 1 j 3_ _ Q i m-l \Q , rt-l \ Mil 

M22 ^ ci— ftnA ~ ri-/Jn / 1+A»ii 

It will be seen from (32) that fi 2 2 an d therefore T(fiu) are both decreasing func- 
tions Of flu. 

Finally, we turn to the maximization of fiu over \x £ Q with n,Ci,T = T° fixed, 
accomplished by considering all choices of u, z constrained to lie in a compact set T 
so that n, ci, T = T° are fixed. Then fiu = Mil is maximized at some point (u°, z°) 
in r. If (u°, z°) ^ S, we can find, (u 1 , z 1 ) £ S maximizing T for the given n, c\, fi^, 
so that T(fi\ 1 ) > T°. Since /x^i is maximal, the value of fiu at the point (u 1 ,^ 1 ) 
given n,c\,T = T° must satisfy fi\ x < fi\ x . Also, the maximal value of T given 
ri,c\,Hu i s achieved at the unique point (u 1 ,^ 1 ) S S, so that T° = T(/i n ). Since 
T(fiu) is decreasing in fi u , T° = T(/z n ) > T(fi ( ( 1 ) which contradicts T(^ 1 ) > T° 
and establishes that the maximum of fiu over /j e O with n,Ci,T = T° fixed 
occurs at a point (u, z) £ S . 

A similar argument is used for the minimum of /in, first minimizing T for fixed 
r m ,c n , fi mn over possible choices of the last row and column, showing that that 
the values of last column other than the last entry are equal, and the values of the 
last row other than the last entry are equal. And then transfer this result to the 
minimization of fi mn for fixed r m ,c n ,T. 
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This concludes the proof of lemma 3.1.1. 



From lemma 3.1.1, the minimum entry u mn for given r m ,c n , T occurs when fi m i = 

■ ■ ■ = fJ-ij = ■ ■ ■ = Mm(n-i))Min = • • • = M2n = • • • = M(m-i)n- In this case 

(n - l)/z m i + Hmn = r m ^ \i m \ = 0(1), 

(m - l)/iln + l"m B = C„ Mln = 0(1), 

(n - 1)mii + ni n = n fiu = 0(1), 

1 + V/W = (1 + 1/Mml)(l + V/iln)/(l + I/Mil) - O(l). 

This guarantees that /x mn is bounded away from zero in the extreme case where 
it takes its smallest value, so it must be bounded away from zero in every case. 
Also < n /n is bounded away from oo by the first assumption. Thus \i mn w 1 
as required. 

The maximum entry /in for given m,n,r\,C\,T occurs when n 12 = ■ ■ ■ = ny = 

■ ■ ■ = /Ui„, j«2i = ■ ■ ■ = = ■ ■ ■ = Mmi- We will show for this maximal entry that 
limsup/Uii < oo if and only if liminf[(l + n/r{){l + m/c{)/(l + nm/T)} > 1. 

(n - l)fj,in + Mil = ri, 

,q 4 \ (m-l)Mmi+Mn = ci, 

(n-l)Mm« + Min = (T - n)/(m - 1), 

1 + 1/Mii = (1 + 1/Mmi)(l + 1/Mm)/(1 + 1/Mmr0- 

It follows that fii n rss l,Mmi ~ l,Mnm ~ T/mn. If limsupun < oo, then /xi„ ~ 
ri/n, Mmi ~ ci/m, and 

(35) (1 + 1/mu) - (1 + n/n)(l + m/ci)/(l + nm/T) > 1. 

Conversely, if liminf[(l + rc/ri)(l + m/ci)/(l + nm/T)] > 1, 
(36) 

(1+1/Mii) = (l+l/n ml )(l+l/ Ul „)/(l+l/n mn ) > (l+n/n)(l+ro/ci)/(l+l//w), 

so also liminf(l + 1/mii) > 1, which implies limsupMii < oo, as required. This 
concludes the proof of Lemma 3.1. 

Lemma 3.3 

(37) log \V d \-dlogd = 0(d). 
Let Sij = 1 if i = j, Sij = if i ^ j. Then 

(38) E d (t ir t js ) + 0(d- 2 ) « [5^ + 5 rs ]d-\ 
Proof: 

Let Xjk = Mjfe(l + Mjfe)- The quadratic form t'V d t = K 2 d = ^2j k t? k Xjk is increasing 
in each Xjk, so that the determinant \Vd\ is also increasing in each Xjk] thus \Vd\ < 
\Vd(Xu)\ where Vd(Xu) is the covariance matrix corresponding to the quadratic 
form Kj = Ejfe^Au, for which \V d (X n )\ = X^m^n™- 1 . Similarly, \V d \ > 
\V d (X mn )\ = X d mn m n - X n m - X . Thus log|V d | - dlogd = 0(d). This result may also 
be obtained by noting that \Vd\ is a sum of m"~ 1 n m ~ 1 products of d coefficients 
Xjk- 

Again, since the quadratic form t'V d t is increasing in each Xjk, necessarily the 
quadratic form t'V d ~ 1 t is decreasing in each Xjk, so bounds for the variances induced 



12 



by t ~ AT(0, V d ~ ) are obtained by setting all the Xjk equal to An or to X mn - This 
establishes that E d t 2 k, dr 1 . 

To bound the off-diagonal terms in V d , note that t ~ iV(0, V^ -1 ) allows us to 
determine the conditional distribution v\w from the quadratic form t'V~ 1 t with w 
fixed, and similarly the conditional distribution w\v. Indeed the Vj are independent 
given w, and the Wk are independent given v. This gives a relationship between the 
d and w covariance matrices which produces the required bound on the off-diagonal 
terms. A result similar to lemma 3.3 is proved in [BH09] using non-probabilistic 
methods. 



(39) Define a = ly/ X ' j ' a '' = ly/ Aij ' aij = ai Aij ' 

t> = a XjVi, w = a KjWj, Vi=Vi- v, Wj = Wj - w. 

Note that /in w /i mn s=s 1 =>■ minjj(o!jj/a^ fe Ajy ) — e > some £ > 0. 
And Uj|to ~ AT(— Er ctijWj, a*) independently for different i. Then 
(40) 

E{Vi\w} = -J2j a ijWj, 
E d {viVj\w} = a&j +J2rs a irCKj s W r W s , 

E d {viVj\w} = ctiSij - a + J2rs a ^w r ajsW Sl 

E d {viv\w} = Y,rs a irU> r U>, 

Ed{ViVj} = CXjSij - a + Y Jr s a ir a jsEd{w r W s }, 

= Otjdij - a + J2rs( a 'ir ~ £a Efc ^kr)aj S Ed{w r W s } SUICC Efcr ^k r W r = 0. 

Note that Edtf w cT 1 SdW 2 . = O^ 1 ). 

Also a w d~ 2 , max ay w <S ir = a ir — ea^t A fc»- ^ 0' Er = l~ e - 
(41) 

E d ViV 3 < ajdij - a + 0(d 1 ) max(£ d w^ + E r ^ s "irQ^s max r ^ s £ d w r w s , 
< ajSij + 0(d~ 2 ) + (1 - e) max r ^ s E d w r w s , 
maxj^ EdViVj < 0(d~ 2 ) + (1 — e) max r ^ s EdW r w s . 

Similarly, 



(42) 



min^j E d ViVj > 0(d 2 ) + (1 - e) min r ^ s E d w r w Sl 
max^j \E d ViVj\ < 0(d~ 2 ) + (1 - e) max riS \E d w r w s \. 



The joint distribution of the v i7 w r depends on the joint distribution of the tj k and 
so does not depend on the particular particular linear combination of Vj , 1 < j < 
mwk, I < k < n that is set zero to reduce the dimensionality of these m + n terms 
to d = (m + n— 1). Thus the reverse result holds conditioning on the Vf. 



maxj^j \EdViVj\,max r7 i s \EdW r w s \ = 0(d 2 ) 



max r ^ s \E d w r w s \ < 0(d 2 ) + (1 — e) max^- \E d ViVj 

') 

A similar argument shows that max \E d WiVj\ = 0(d~ 2 ). Also 

(44) t'Vt = J2-. 4 A « = S..(«< + + wf/a 
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so that v + w is independent of Vi , Wj with variance a w d 2 . Concluding the proof 

of lemma 3.3, 

(45) 

E d (t ir tj s ) = E d {Vi + W r + V + w)(l)j + w s + v + w) 

= E d v l v 3 + Ediths + E d w r Vj + E d w r w s + E d (v + w) 2 
E d {t ir t js ) + 0{d- 2 ) w [Sij + 5 rs ]d- 1 . 

We now apply theorem 1 by verifying the conditions I-V. Similar propositions 
to I- IV are proved using similar methods in [BH09]. 

CONDITION I: K 3 d = 0(1), n\ = 0(1). 
(46) 

4 = E diKl? = E d (E jk t 3 kNk (l + Nk )(l + 2 Nk )f - o(E Jkrs \E d t%t 3 rs )\, 
Edt 3 jk tl s = 9Edtj k Edt rs Edtjkt rs + 6(Edtjktr S ) 3 

From lemma 3.3, 



(47) 



Edtjktrs = 0(d 2 + (S jr + 5 ks )d 1 ), 
E d t%t 3 rs = 0(d- A + (S ir + 5 ks )d- 3 ). 



In the 0(d 4 ) terms in the sum J2j krs E d t 3 k t 3 s , there are 0(d 3 ) terms in which 
(Sj r + S ks ) > 0; thus the sum over all terms is 0(1). 

n d = E d K d is the sum of d 2 terms of 0(d~ 2 ), so it also is bounded. 

CONDITION II: E d {Q £ exp[±Kj(t)}} = 0(1). 
For X, Y joint normal with mean zero, 

cov(X 4 ,Y 4 ) = 72EX 2 EY 2 E 2 XY + 24E 4 XY 
(8) cov(4,4) - 0(d- e + (5 jr + 4 S )0 

Since there are only d 3 covariances for which (Sj r + Sks) > 0, 

(49) E d (Kj 4) 2 = E d (Kj E d Kj) 2 = 0(£ jfers |cov# fc ,£)|) = 0(d^). 

From [D87] Corollary 5, since K d — n d is a polynomial of degree 4 in Gaussian 
variables, 

r>\^E d \K\-KT < r*r[E d (Kj-4)Y<C r d-r, 

(M) P d {Kj>K d + l} < Crd^. 

When t <~ N(0, V^" 1 ), the multivariate normal density is Aexp[—^K d (t)]. Thus 
E d exp[aK d (t)} = (1 - 2a)~ d . Also, since the /j,jk are bounded, K d Q £ < Ce 2 K d Q £ . 
Thus 

(51) 

E d Q £ exp{±K 4 d (t)} < E d cxp{±(4 + 1)] + E d {K d > k\ + 1} exp(^Ce 2 ^ 2 ) 

= 0(l)+Ed l/2 {Kf> 4 + 1} (l~\CM 2 \ogd/d)- d / 2 

= 0(1) + Cr /2 d- r / 2 d? CM2 = 0(1) for r > CM 2 . 



CONDITION III: Q e [\og(j> d (t) - E^^W^l = o(l). 
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For a geometric with mean jj, w 1, the log centered characteristic function nas 
the standard Taylor series expansion: 

log WW - £*=2^WA-! + 0(i)l*l 5 , 
(52) iog^(t) = E jfe iog^ fe fe)=E'=2^ r A! + o(i)E J - fe l^l 5 , 

£ jfc M B - 0(d 2 e 5 ) = o(l), 
as required. 

CONDITION IV: E d Q E cxp[-iK%(t)/6 + k%/72 + K A d {t)/2A - 4/24}] -> 1. 
We will first show that ifj = E^fc ^%^jk{^ + Mj'*;)( 1 + ^Mjfc) nas the same moments 
in the limit as a normal distribution N(0, k%). 

Define u a = tjk\pjk{l + Mjfe)(l + 2/Ujfc)] 1 / 3 where a ranges over the pairs of 
indices in A = {(j,k),l < j < to, 1 < ft < n}. Let G be the graph on A with 
edges (a, f3) £ G whenever either the first or second index of a, /3 are the same. In 
particular, (a, a) G G. 

,„x SdUaU^ = 0(d- 2 + G («'' 3 )), 

[b6) E d uluj = 0(d~ 4 + G ^). 

Let {X a } denote a multivariate normal with EX a = Q,EX a Xp = E^u^ui. We 
will show that K\ = E Q and ^ a I„ have moments differing by 0(d^ 1 ). The 
first two moments are identical, by definition, and the odd moments are zero for 
both variables. For the 2rth moment: 



( ' E(Z a X a r = Z a E(X ai X a2 ..X a2r ). 

The terms ^(m^"^-^ tend to ^ e l ar g er when many of the pairs of on have 
edges in G; this size is compensated by the fact that fewer sets of 0:1,02, ••02r have 
many edged pairs. In order to count such sets, for each a\, a 2 , ..a 2r we define a set 
of directed trees r(a) = {ri(a), r 2 (a), ..r t (a)} on the a-indices (1, 2, .., 2r). 

The tree Ti(a) is initialised with root 1; then progress through the a-indices in 
order, attaching j to ft if (aj,ak) € G , and ft is the smallest index already attached 
to the tree for which (a,, Ofc) € G. The tree Tj(a) is constructed similarly on the 
set of a-indices not attached to the trees {n(a), T 2 (a), ..Tj_i(a)}; begin with the 
root n, the lowest a- index not attached to previous trees, and progress through the 
a-indices in order, attaching j to k if (ctj,ctk) € G , and k is the smallest a-index 
already attached to the tree Tj(a)for which (aj,afc) G G. 

For a set of trees r = {n, r 2 , ..r t } partitioning the a-indices {1, ..2r}, the num- 
ber of ai,a2,..a 2r for which r(a) = r is 0(cP r+ *); to see this, consider the ith 
tree Tj which has ,say, a-indices j\ = ri,j2, ■■j ni ■ As through 
the 0(ci 2ni ) possible values in A ni , a^ passes through 0((i 2 ) values, but the re- 
maining aj k in the tree tj each pass through only 0(d) values, since each such 
Qjj. is constrained by (a Jfc , a Jfc , ) € G for some fixed ft' < ft. Thus the number of 
Qjj, Qj 2 , ..aj n . with Tj(a) = r, is 0(d ni+l ). Noting that E, n i = 2r, the number of 
ai, a 2 , ..a 2r for which r(a) = t is the product of these quantities 0(d 2r+t ). 

For a particular ai, a 2 , ..a 2r with trees r(a) of sizes n\, ..n t , Wick's formula for 
E(X ai X a2 ..X a2r ) is the sum over all partitions into r sets of pairs of variables, of 
the product of the covariances for those variables. The maximal order products 
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occur when the pairs of variables designated in the partition lie as frequently as 
possible within one of the trees in r(a). Smaller order terms may be ignored because 
their number is bounded for r fixed. If all the tree sizes rij are even, the maximal 
product occurs when each pair of variables designated in the partition has an edge 
in one of the trees; the covariance for each such variable is 0(d~ 3 ), so the product 
is 0(d~ 3r ). If there are s odd terms rij, there are s/2 pairs lying in different trees 
and having smaller covariances, so 

E(X ai X a2 ..X a2r ) = 0(d- 3r - s / 2 ), 
( j E aW{a) =rE(X ai X a2 ..X a2r ) = o(d-'+*-/ 2 ). 

Now 

(56) -r + i- S /2 = -i]Tn 4 +t- S /2=i £ (2-^) + ^ £ (1-n*). 

"* even n, Odd 

Thus 

r , 7 , Ea|r(a)=T S (*«i*« a "*a ap ) - 0(1) if max < 2, 

1 J Ea|r(a)=r S (^ai^a a ..^a 2P ) = 0(dT 1 ) if max > 2. 

For a particular ai, a2, ..a2r with trees r(a) of sizes m,..n t , Wick's formula 
for Ed(u^ ..u^ ) is the summation over all partitions into 3r sets of pairs of 
variables, of the product of the covariances for those variables. Again, the maximal 
terms occur when the pairs of variables lie as frequently as possible within the trees 
of t. If all the tree sizes are even, the maximal product is 0(d~ 3r ). If there are s 
odd tree sizes, there are s/2 pairs with smaller covariances, so again 

r „x Ea|r(a)=r^««a 2 »«« P ) = 0(1) if max n 4 < 2, 

1 ' Ea|T(«)=r^(<«a 2 -«a ar ) = 0(0 if maxn, > 2. 

Thus, in equation (54) we need only consider summation over a whose trees have 
maximal size 2. Let r(2fc,2r-2fc) denote the trees {(1), (2), (3), ..(2k) (2k + 1, 2k + 
2)...(2r— 1, 2r). There are Q^) such r with 2fc elements of size 1 and r — k elements 
of size 2. 

Case 1. p(0,2r): All trees of size 2 

For example, a = (11), (12), (22), (23) has trees {(1, 2), (3, 4)}. 

For a particular a with this partition, Wick's formula for E(X ai X a . 2 ..X a2r ) 
gives a term E(X ai X a2 )..E(X ct2r _ 1 X a2r ) of 0(d~ 3r ) when the Wick's partition 
corresponds to r(0, 2r), and terms of 0(d~ 3r_1 ) when the Wick's partition includes 
some terms that are not concordant with r(0, 2r). 

Also, Wick's formula for E d (u 3 ai u 3 a2 ..u 3 a2r ) gives a term fi^Uaj)-^^,..^^,.) 
of 0(d~ 3r ) by summing over the partitions of the 6r variables u ai that conform 
to r(0, 2r); for example, the variables w ai , u ai , u ai , u a2 , u a2 , u a2 will be paired in 
15 ways. All other partitions of the 6r variables have at least one pairing not 
conforming with r(0, 2r), and the corresponding covariance for that pair is 0(c? -2 ), 
so that the contribution of all other partitions is 0(d^ 3r ^ 1 ). 

By dc&mtion,E(X ai X a2 )..E(X a2r _ 1 X a2r ) = E(u 3 ai u 3 a2 )..E(u 3 a2r _ i u 3 a J. Thus 

(59) E(X ai X aa ..X a ^) = E(uiy aa ..ul^_y a J + 0(d- Sr - 1 ). 

Case 2 p(2r, 0): All trees of size 1. 
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For a particular a with this partition, Wick's formula for E(X ai X a2 ..X a2r ) sums 
E(X aii X ai2 )..E(X ai X ai ) over all partitions of a into r pairs of variables. 
Wick's formula for ^(m^u^..^ ) consists of a leading term in which, for each i, 
two of the u ai are paired; the other terms have at least one u ai paired with three 
u a 's that it is unlinked to, and the corresponding covariances have smaller order. 
The leading term is thus the sum 9 r Eu 2 ti Eu 2 l2 ...Eu 2 l2r E(u aii u ai2 )..E(u ai2 i u ai2 ) 
over all partitions of a into r pairs of variables. 

Noting that E(X aii X ai2 ) = 9Eu 2 ai Eu 2 a2 E(u ai u a2 ) + 0(cT 6 ), obtain that 

(60) E(X ai X a2 ..X a2r ) = E(ulul 2 ..ul 2r _ul 2r ) + 0{dT^). 

Case 3 p(2k, 2r — 2k): 2k trees of size 1, r — k trees of size 2 

For a particular a with this tree, Wick's formula for E(X ai X a2 ..X a2r ) has 
leading product terms in which the partition of the 2r terms is such that the 
terms X a2k+1 X a2k+2 ..X a2r are paired conforming to the last r — k trees of size 2 in 
r(2fc,2r - 2k). Thus 
(61) 

E(X ai X a2 ..X a2T ) — E(X ai X a2 ..X a2k )E(X a2k+1 X a2k+2 ..X a2r ) + 0(d '' ) 

Similarly, for a particular a with this partition, Wick's formula for Ediu^u^-.u^^) 
has leading terms in which the partition of the 6r terms is such that the terms 
u a 2 k+i u a2k+2-- u a 2r are P an " e d conforming the last r — k trees of size 2 in r(2k, 2r — 
2k). Thus 

(62) 

s(<<-<J = ^(<<-<J^(< fc+1 < fc+2 --<._ 1 < r ) + o(rf- 3 '-- fe - 1 ) 

From the equivalences in case 1 and case 2, 

(63) £(«.•<„) = E(X ai X a2 ..X a2r ) + Oid-^-x- 1 ) 

Since there are 0(d 3r+k ) different a with the trees r(2k, 2r - 2k), 
(64) 

E E («-< r )= E E{X ai X a2 ..X a2r ) + 0{d- 1 ) 

r(a)=r(2fe,2r-2/c) r(a)=r(2fe,2r-2fc) 

Since this equivalence holds for all partitions with element size at most 2, and the 
contributions from other partitions are negligible, 

(65) E^«"<r)=E^ 1 X «-- X «-) +0 ( rf " 1 ) 

as required. 

We have shown that K\ = ^2 a u 3 a and ^ a X a have moments differing by 0(d^ 1 ). 
Since J2 a X q (k^) -1 / 2 ~ iV(0, 1), and a normal random variable is determined 
uniquely by its moments, K^(n^)^ 1 ^ 2 — » N(0, 1) in distribution as d — > oo. 

For Z ~ N(0, 1), P{\Z\ > A}< cxp(-±A 2 ). 

Thus Q e — > 1 in probability as d — > oo, since for, M large enough 
(66) 

P d {Q e = 0} < E P <i{l^l > ^Vlogd/d} < dexp(-M 2 logd/0(l)) -S- as oo. 
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Thus E d Q s exp[-iK%(t)T(Kl)-i] —> Eexp[iTN(0, 1)] = exp(-±T 2 ) uniformly 
in any finite interval T 2 < A. Since < C, the convergence is uniform in 
T 2 < n\AjC. Now choose A = C to get convergence at T = («^)~ 1//2 : 

(67) E d Q E exp[-^ d 3 (t) + ±4} -+ 1. 

Since Kj — n\ — > in probability, and using condition IV, 
^ |£ d Q £ exp[-I^ 3 (i) + ^«3](exp^[Jf*(t) - 4] - 1)| 

Thus, as required, £ d Q £ exp[-±i*r 3 (t) + ±k\ + ^^(t) - -> 1. 

CONDITION V: For some M, e = M^/logd/d, Jq^_q s \<f>d\/ J Qs \<Pd\ = o(l). 
A similar result is proved in [BH09] using analytic methods. 

Proof: We define a probability P d on ti, ...i m+ „ = v\, ..v m ,wi, ..w n € (— 7r, 7r] m+ " 
with density proportional to \4> d \. To prove condition V, we need to show, for some 
M, P d {ma,Xi \ti\ > e\t m+n = 0} —¥ as d — )■ oo. The method evaluates the con- 
ditional probability of large deviations in any single parameter ti when the rest of 
the parameters are well behaved. 

Since the geometric variable is integer, the geometric characteristic function has 
period 27r, so individual geometric characteristic functions il>n jh have values near 1 
when the argument Vj + w k has values near 2tt or — 2ir. This will not happen for 
many pairs Vj , w k , but is best handled by transformation of each Vj and w k from 
(— 7r,7r] to the unit circle {x\e lx — 1}: 

(69) vj = e~ iv > , w k = e iw * ,v = ± £ . vj ,w = ± £ fe % ■ 



Lemma 3.4: With constants 0(1) independent of d,j, k 



(70) exp[-|^ - w fe | 2 0(l)] < \iP Hk ( Vj +w k )\ < exp[-\vj -w k \ 2 /0{l)]. 



Proof: 

For constants k((i), K(fi), and for all t, 
(71) 

expHe* - l\ 2 k( P )] < \^(t)\ 2 = 1 + M/i+ 1 1)|ett ^ 1|2 < exp[-|e w - 1\ 2 K( P )]. 
Also \ e l(v J +Wk ^ — 1| 2 = \vj - w k \ 2 . Since /j,j k w 1, the lemma is proved. 
Lemma 3.5 : 

(7<2) Define i? 2 = E,-fc l«j ~ *fc| 2 - 

y ' Then, for some M,P d {R > de} = exp[-d/0(l)]. 

This lemma guarantees that only t values where most of the \vj — w k \ are small 
make significant contributions to the probabilities P d . 

Proof : From (70), 

(73) 111^(^+^)1 < cx P [-i? 2 /0(l)]. 

jk 
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We have previously used / to denote integration over the d variables ti, ..t m + n -i, 
and we will now use J m+n to denote integration over all variables ti, ..t m+n . From 

conditions I-IV, theorem 2 implies that / Q e <j>dl P{Xd = 0} — >• 1, 
so / \<j>d\ > | J Q £ \4>d\ = cxp(— Tjdlogd + O(dj) . The integral of \4>d\ over the first 
m + n — 1 parameters is the same for each choice of t m+n , so the integral over all 
m + n parameters is J m+n \<j> d \ = 2-k J \</> d \. 

Thus, for M large, 
(74) 

f m+n {R > de}\M < J m+n exp[-d 2 £ 2 /0(l)] < exp[-M 2 dlog(d)/0(l) + 0(d)], 

P d {R>de} = f m+n {R>d£}\M/I m+n \M=^P[-d/0(l)]. 

Lemma 3.5: For M large enough, max, Pd{\vi — w\ > e} — exp[— d/0(l)]. 
Proof:. 

(75) From lemma 3.5, for some M, Pd{R > de} — > as d — > oo, 

Define R-i = J2jk j^i \vj ~ ^k\ 2 ■ Of course R-i < R. For i < m, 



(76) R- t < de => mV \w k - w\ 2 < d 2 e 2 => min|w fe - w\ < ( 1 ) 1/2 e = si. 

k n m 

k 

By the metric inequality, the interval Ik = {v\ \v — uik\ < £1} on the unit circle, of 
length at least 2ei, is such that \v — w\ < 2e\ for v E I. 

Letting = {tj,j ^ i}, note that the conditional density of U\t-i is proportional 
to FJ fc \ipik\- Then, for t-i satisfying R-i < e, and M 2 chosen large enough, 
(77) 

exp[-£ fc l^-*fc| 2 0(l)]| < njV'ifcl <eM~d\^~w\ 2 /0(l)], 

Pd{\vi -w\> ^\t-i} < exp[-de 2 /0(l)]/ f[\ k \ip ik \dU, 
1 > PdUh- w\ < 2£i|i_ i } > cxp[-de 2 0(l)] J \vi - w\ < 2e 1 }dt l / JUklMdU, 

1 > 2e 1 exp[- ( i £ 2 0(l)]//n fc l^l^ 
P d {\vi-w\ >e 2 \t-i} < exp[-de2/0(l) + de?0(l)]/2ei 
= exp[-d/0(l)]. 

The same M 2 holds for all i because fj,j k ~ 1, so the 0(1) bounds hold for all i. 
Finally, again with the same 0(1) for all i, 
(78) ' 

P d {\vi -w\> s 2 } = P d {Pd{\vi -w\> e 2 \t^}{R^ < e}} + P d {P d {\vi -w\> e 2 \t-i}{R-i > e}} 
< cxp[-d/0(l)]P d {i?_ 4 < e} + P d {R-t > e} 
ma,XiP d {\vi - w\ > e 2 } = exp[-d/0(l)]. 

Now, under Pd, the variable w n is independent of the variable max,j |t, —tj\. Also, 
if maxij \ti — tj\ < e < l,w n = L then max^ \t{\ < 2s. (We need to constrain e 
so that maxi \ti\ < n/2 to avoid difficulties with the period 27r of the geometric 
characteristic function.) Then, for some constants M 2 , M 3 , M4, M 5 , M 6 , 
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P d {max\vi - w\ > £ 2 } < Y,i p d{\vi - w\ > e 2 } = exp(-d/0(l)) 

P d {ma,x \uii - v\ > £3} = exp(-d/0(l)) 

P d {w) - vj > £4} = exp(-d/<9(l)) 

1 ' P d {maxi j \t i -t j \>e 5 } = exp(-d/0(l)) 

P d {maxjj |tj - t,| > £ 5 |w„ = 1} = exp(-d/0(l)) 

P d {m&Xi \ti\ < e 6 \w n =0} = exp(-d/0(l)) 

This concludes the proof of the validity of condition V. 
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4 Equal row and column sums 

Consider the special case of [CM07]where the row sums are equal, and the column 
sums are equal, so that = fin,Cj = fim. In this case \V d \ = n m ~ 1 n m ~ 1 a 2 ( m+n ~ 1 ^ 
where a 2 = fj,(l + fi). In moment calculations, it is convenient to consider the linear 
transform 

u = T,j v j/ m + T,k w k/n, 
(80) Vj = v-j + Y, k w k/n, 1 < j < m, 

Wk = Wk + J2j Vj/m, 1 < ft < n. 

Note that Q e (t) = 1 =► \U\ < 2s, \Vj\ < 2s, \W k \ < 2s. When t ~ JV(0, V^ 1 ), the 
U, V, W are multivariate Gaussian in d dimensions with 

U ~ N(0,l/mn<r 2 ), 
, . Vj ~ N(0, 1/na 2 ) independent , 1 < j < m, 

^ ' W k ~ N(0, 1/ma 2 ) independent , 1 < k < n, 

U, Vj — U, Wk — U independent . 

Then 
(82) 

K 2 = [-mn^ + n^Vf+m^W 2 ]*! 2 , 

K\ = [-mntf 3 + n£.V/ + m£ fe I^V(l + 2 M ), 

Kj = [-mnU^ + nY Jn V^ + mY Jk Wt + QY, j {Vj-U) 2 Y, k {W k -Uf]a 2 {l + &a% 
(83) 

E d {K z d ) 2 = 3(5(TO + n-l) 2 -4(m-l)(n-l))(l + 4(7 2 )/(mncr 2 ), 
E d Kj = 3(m + n-l) 2 (l + 6cr 2 )/(mncr 2 ), 
P{S d = 0} = (2 7rC r 2 )-( m +™- 1 )/ 2 m( 1 -™)/ 2 n( 1 - m )/ 2 X 

exp([6(m - l)(n- 1) - (m 2 + n 2 - 1)(1 + l/er 2 )]/12mn). 

Dropping terms 0(l/d), the exponential term is exp[i - + ^)(1 + l/er 2 )/12]. 

Now the number of points satisfying R = r, C = c is estimated as: 

(84) 

Q(R = r,C = c) = P{R = r,C = c) exp(J(P)) = P(S d = 0)[(1 + ^V^F" 
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Using data from [CM07] , page 5, 

Table 1: Estimated number of contingency tables 
with given constant row sums and constant column sums 



Rows 


Cols 


Summand mean 


Exact 


Edgeworth 


[CM07]1.2 


10 


10 


2 


1.10 10 59 


1.12 10 59 


1.23 10 5y 


3 


3 


100/3 


1.33 10 y 


1.23 10 7 


1.68 10 Y 


3 


49 


49/3 


1.01 10 68 


4.04 10 14Y 


1.25 10 68 


3 


9 


11 


2.79 10 21 


2.84 10 2i 


3.49 10 2i 


18 


18 


13/18 


7.95 10 127 


8.05 10 127 


8.50 10 127 


30 


30 


1/10 


2.23 10 b9 


2.23 10 by 


2.32 10 by 



The hideously bad approximation at m = 3,n = 49, mean = 49/3 occurs 
because the n/m terms in the Edgeworth correction are no longer accurate. (In 
[CM07], Canficld and MacKay express their approximation as a correction to Good's 
joint hypergeometric approximation, rather than as a correction to the multivariate 
Gaussian approximation; this approach produces an estimate that does not involve 
n/m terms.) 
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5 The number of graphs with a specified degree sequence 

Consider a symmetric table of 0—1 integers = Xji,Xa = 0, 1 < i < n, 1 < j < n 
with given row sums Di = J2j Xij = di ■ The row sums are the degrees of the undi- 
rected graph in which Xij = 1 corresponds to an edge between nodes As before 
we use Di for a random variable, di for a particular value. The random variables 
{Di} take values on {0, 1, ..(n — 1)}™. We wish to estimate the number of graphs 
with the specified degree sequence. 

The Edgeworth approximation begins with the maximum entropy distribution 
on {X^} with expectations EDi = di, which consists of independent Bernoullis 
with expectations /U^: 

(85) P{X ij =x} = tf j (l- fHj ) 1 - x , 
where 

(86) log(/iij/(l = a i + a i> 
and the parameters Qj are chosen so that 

(87) ED i = J2 j Vij=n, 

provided that there exist a that solve these equations. See [BHIOb] for conditions 
on the degree sequences for such a's to exist. 

The conditional distribution of {X^} given the degrees {di} is uniform. The number 
of graphs with the specified degree sequence is 

(88) q(D) = P{D = d}exp[J(P)] = P(D = d)/JJ (1 - ^f~^^. 

i<j 

The probabilityP{Z? = d} is estimated by 

(89) P{D = d} = 2(2n)- n/2 \V n \- 1/2 exp(-nl/72 + 4/24) 

determined by the first four cumulants of D following the program of section 2. 

The reason for the initial factor 2 is that the sum of the degrees is even; the 
lattice of all possible degree sequences has determinant A = 2. The characteristic 
function over the cube (— n,ir] n concentrates at t = and also at t — tt ; the 
Gaussian formula for the integral near t = produces the same value near t = tt, 
so the total integral is twice the formula for the integral near t = 0. For nearly 
regular graphs, graphs whose degrees are in the ratio 1 + o(n -1 / 2 ), the Edgeworth 
formula reproduces the asymptotic formula in [MW90] . 

Each element of D is a sum of independent Bernoullis with expectations {Hij}. 
The validity of the asymptotic estimate depends on the behaviour of the character- 
istic function of D — d, with parameters tj, 1 < j < n, setting tjk = tj + tk, 

(90) Mt) = E{exp( l t'(D - d)} = H 1> Nh {t jk ) = [] e- a »H>{l + ^ k e^) 

j<k j<k 

The cumulants K^(t) of t'D are the sums of the corresponding cumulants of the 
Bernoullis with expectations fijk and parameters tjk = tj + tk , 

K 2 n = Y,j < k t \kNk^-Nk)=t'V n t, 

(91) Kl = J2 1< k t %^k(l-^k)(l-2^k), 

K n = Hj<k tjkVjk(l - fi jk )(l - 6n jk (l - Hjk))- 
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Then the Edgeworth approximation terms are = E n (K^) 2 , k„ = E n (K*), 
where the expectation E n is under the assumption t <~ N(0, V^ 1 ). We show in 
[BHIOb] that the formula (88) is valid under similar conditions for the contingency 
table case, namely that the binomial expectations are relatively bounded as n goes 
to infinity. 



24 



6 Regular graphs 

Consider a regular graph, where the degrees all equal to d . Then \i = d/(n — 1); 
let v = — 

V n (i,j) = v(l + S ij {n-2)), 

|K| = 2(n-l)(n-2) n -V, 

(92) y- 1(i .v = -^ry+^ 

*n V'-'-' (n-2)v ' 

c nl'iTlj^lrTlj) — (n-2)v 

These expectations may be derived directly, without inverting V, by noting that 
t'Kjt <~ Xn has mean n and variance 2n. The final equation is used in evaluating 
the third and fourth cumulants, using Wick's formula: 

(93) EX 4 = 3(EX 2 ) 2 ,EX 3 Y 3 = 9EX 2 EY 2 EXY + 6(EXY) 2 . 



Kl = n n {Kl) 2 = ?,[{l~^) 2 /v][A{n-2) 2 + l]/[n(n-l)] 1 
V j < = f * n K* = 6{l/v-l){n-2)/{n-l). 

For n even, the estimated number of regular graphs of degree d is 



(95) 

P{D = d}exp(I(P)) = P{D = d}[(l - M ) 1 -^]-(n-i)/2 j w here 

-1/2 

X 



P{D = d} = 2(2nvy n/2 2(n- l)(n-2) r 
exp i 



(-|[(l^-4)^p + i(l/.-6)^f]), 

orP{£» = rf} = exp(-flog(27rm) + 0.5^2+1-^ + 0(1)). 

The last formula is identical to the formula given by McKay and Wormald 
in[WM07]. The previous formula improves the accuracy for modest n by carry- 
ing the n — 1 and n — 2 terms which give the exact contributions from the third 
and fourth cumulants. Note that the approximation is symmetric about the degree 
d= (n— l)/2,/i = 1/2. This is as it should be, since the number of regular graphs 
with degree d is the same as the number of complementary regular graphs with 
degree n — 1 — d. 

The estimated number of graphs is maximized at fi = 1/2, taking the value 
(2™- 2 /7m)™ /2 cxp(l/2)\/2. 

This can't be too far off, since we get 2 n ( n ~ 1 ^ 2 graphs by assigning the n(n — 
l)/2 edges in all possible ways, and we would expect most of the degrees in that 
population of graphs to be about d = (n — l)/2. The other terms in the expression 
are the Gaussian correction to get the degrees exactly d, and then the Edgeworth 
correction that identifies a constant ratio departure from the Gaussian formula in 
the limit. 
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Table 2: Log number of labelled regular graphs 
+ error in Edgeworth approximation 



Vertices/Degree 


3 


4 


5 


6 


8 


9.87+.06 








9 




13.84+.04 






10 


16.23+. 10 


18.01+.04 






11 




22.37+.05 






12 


23. 17+. 14 


26.90+.06 


28.72+.03 




13 




31.58+.08 




35.28+.03 


14 


30.60+. 18 


36.42+.09 


40.18+.04 


42.04+.03 


15 




41.39+. 10 




48.98+.03 


16 


38.46+. 20 


46.49+. 11 


52.31+.06 


56.11+.03 


17 




51. 71+. 12 




63.41* 


18 


46.68+. 23 


57.05+. 13 


65.04+.08 


70.88* 



• * numbers are not computed, but estimated from the Edgeworth formula 

• The approximation works best when the degree is near half the number 
of vertices, and gets progressively worse for fixed degree as the number of 
vertices increases. However, the approximations are not too bad even near 
the edges; for example the error for 40 vertices and degree 2 is .6 on the log 
scale, which is about a ratio of 2. 
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7 Irregular Graphs 

Consider now graphs with m vertices of degree d\, n 2 vertices of degree d 2 . The 
maximum entropy summands are independent Bernoullis on the edges with prob- 
abilities 

Pn for the edges 1 < i < j < ni, 
P12 for the edges 1 < i < ni < j < ni + n 2 , 

P22 for the edges (i,j),m < i < j < ni + n 2 . 
The maximum entropy choice of the p's is the unique solution , when it exists, to 

(ni - l)pn + n 2 pi2 = di, 

(96) (n 2 - 1)^22 + "-1P12 = c?2, 

P11 P22 . . ( , _2I 2 _\2 
1— pn I-P22 <■ I-P12 ' 

The Bernoulli variances are Vij = Pij(l —Pij)- The random degrees Di have covari- 
ance matrix V : 

Vu = (m - l)un + n 2 v\ 2 , 1 < i < ni, 

Vjj = (n 2 - 1)«22 + ni^i2,ni < i < ni + n 2 , 

Vij = vu, 1 < i ^ j < ni, 

(97) Vij = U12, 1 < i < ni < j < m + n 2 , 
Vij = v 22 ,ni < i ^ j < ni + n 2 , 

\V\ = ((m - 2)vn + n 2 Ui 2 )™ 1_1 ((n 2 - 2)v 22 + niwi 2 )" 2_1 x 

[(2ni - 2)vn + n 2 vi 2 )((2n 2 - 2)v 22 + nwi 2 ) - nin 2 v\ 2 \. 

In the case where n\ = n 2 = n/2,d 2 = n — di — l,n/4 < d\ < in/ '4, then 
pi 2 = 1/2, pn = 1 - p 22 = (di - \n)/{\n - l),t>n = "22,^12 = \, and the 
covariances of the Uj =U + tj needed for are: 

(98) 

A = (in- 2)vu + n/8 

Q = {{n-2)vii+n/8) 2 -(n/8) 2 , 

V^ 1 = l/A + V^ 1 

V^ 1 = {n/16-vn[(n-2)vii+n/8]}/(AQ),l<i<j<n/2, 

V^ 1 = -yQ,l<i<n/2,n/2<j<n, 

\V\ = ((in-2>n+n/8)"- 2 Q, 

N i:j = {1 < i < n/2}{n/2 < j < n} + {n/2 < i < n}{l < j < n/2}, 

E n Ujt k i = WJJ 1 + (5 ik + 6u + 6 jk + Sji)/A + 4(V 1 " 1 - V{^){N ik + N a + N jk + 

(99) 

K n — V\i(l — 2pil(X)i<j<fc<„/2 tjk ~ En/2<j<Kn *jfe) 

K n — Vu(l — 6vn)(X)i<j<fc<n/2 tjk + En/2<j<Kn ^ jk) ~ 8 Xa<j<rt/2<fc<ra *jfc 

The Gaussian approximation: 
(100) 

Q G {D = d} = 2(pn logpii + P22 log 2322 )-"("- 2 )/ 4 (log2)-" 2 / 4 (27r)-"/ 2 |T/|- 1 / 2 . 

The initial 2 is the determinant of the lattice of possible degree sequences. The 
second term is the contribution from the Bernoulli probabilities, the exponential 
value of the entropy. The last term is the Gaussian contribution for the probability 
that D — d. The Edgeworth correction multiplies by the factor 
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cxp(— + computed by k\ = E n (K^) 2 , = E n K^ where the expec- 

tation is taken under the assumption t ~ AT(0, V^ 1 ). 
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Table 3: Log number of graphs with irregular degree sequences 



Degree Sequence 


Exact 


Gauss 


Edgeworth 


44443333 


9.59 


10.22 


9.64 


666666555555 


28.45 


29.03 


28.46 


77777774444444 


24.21 


24.83 


24.33 



The Edgeworth formula is significantly more accurate than the Gaussian formula. 
The Edgeworth formula is more accurate when the degrees are nearly equal. 
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